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SCREENING METHODS IN EUCARYOTIC CELLS 

5 CROSS-REFERENCE TO RELATED APPLICATION 

Commonly owned, copending application 
08/442,641, filed May 11, 1995, describes related subject 
matter and is incorporated by reference in its entirety 
for all purposes. 

10 

STATEMENT OF GOVERNMENT INTEREST 
This invention was made with Government support 
under Grant Nos. AI29135 and 6M47478 awarded by the 
National Institutes of Health. The Government has 
15 certain rights in this invention. 

BACKGROUND 

In recent years, a number of in vitro methods have 
been developed for screening polypeptides for a desired 

20 binding specificity. For example, the phage display 
technique screens polypeptides displayed as a coat 
protein from a bacteriophage. See, e.g.. Dower et al., 
WO 91/17271; McCafferty et al., WO 92/01047.; Ladner, US 
5,223,409 (incorporated by reference in their entirety 

25 for all purposes) . Another in vitro screening method has 
been developed for isolating nucleic acid sequences that 
bind to proteins. See Gold et al . , U.S. 5,270,163. 

In vivo methods for screening libraries have also 
been reported. Such methods usually detect 

30 protein-protein or protein-nucleic acid interactions 

using reporter constructs to identify active members of a 
library (Allen et al., 1995). For example, a yeast 
three-hybrid system has been used to identify a cDNA 
encoding a protein that binds the 3' end of histone 

35 mRNAs, and bacterial reporter systems based on 
transcriptional antitermination or translational 
inhibition have been used to screen RNA-binding libraries 
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in several different contexts (Pouts et al., 1996; Harada 
at al., 1996, Jain et al., 1996; Wilhelm et al., 1996). 
These methods are especially useful for screening cDNA 
expression librctries and, arguably, mimic physiological 
conditions more closely than the in vitro methods. 
Further, such methods allow screening for physiological 
or functional properties as distinct from merely binding 
activity. In vivo screens are usually performed in 
bacteria or fungi, such as yeast, because of high . 
transformation efficiencies and because it is possible to 
transform many cells and obtain individual clones. 

Because many libraries are screened to obtain 
therapeutic compounds and mammalian cells more closely 
simulate the environment of intended therapeutic use than 
procaryotic, yeast or in vitro screens, it would be 
desirable to screen libraries in mammalian cells. For 
example, protein folding or posttranslational 
modification of some peptides may be different in 
eucaryotic and procaryotic cells. Screening in mammalian 
cells is, however, generally more difficult than 
screening in procaryotes because: (1) transfection 
efficiencies are lower; (2) unlike bacteria or yeast 
where plasmid segregation results in clonal colonies, 
transfection of mammalian cells is believed to involve 
the uptake of a large population of plasmids; (3) in 
general, eucaryotic cells do not support episomal 
replication of plasmids, (4) establishment of stable 
eucaryotic cell lines with integrated vector is laborious 
and may not result in expression of many library members, 
(5) recovery of a selected vector from eucaryotic cells 
can be difficult. 

Some progress has been reported to address these 
difficulties. Transfection efficiencies have been 
reported to be improved using liposomes, protoplasts, or 
retroviruses as delivery vehicles (Schaffner, 1980; 
Sandri-Goldin et al, 1981; Rassoulzadegan et al., 1982, 
Feigner et al., 1987; Kitamura et al., 1995). Selected 
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plasmids have been reported to have been enriched from 
pools introduced into a cell by transfection by 
subdividing active pools and performing multiple rounds 
of enrichment (Seed, 1995) . Further, plasmid recovery 
5 has been reported to be improved using episomally 

replicating plasmids containing SV40, polyoma, or Epstein 
Barr Virus (EBV) origins in specific cell types 
supporting episomal replication of such plasmids (Yates 
et al., 1985; Seed et al., 1987; Kinsella et al., 1996) . 

10 Thus, for exairple, a cDNA expression cloning strategy has 
been reported by Seed et al., 1987 using plasmids 
containing an SV40 origin. This origin allows the 
plasmids to replicate episomally to high levels in COS 
cells, which express large T-antigen. SV40-vector 

15 libraries, were introduced into COS cells by protoplast 
fusion, the libraries were amplified episomally, and 
cells expressing the CD2 surface antigen were isolated by 
panning with antibodies, and amplified vectors were 
recovered from Hirt supematants of the isolated cells. 

20 Despite some progress as noted above, difficulties 

and limitations remain in screening libraries in 
eucaryotic cells, and improved methods, are needed. 

DEFINITIONS 

25 A specific binding affinity of one entity for 

another refers to a dissociation constant s 10 /iM, 
preferably s 100 nM and most preferably s 10 nM. RNA 
peptides isolated by the claimed methods bind one (or 
more) RNA binding sites more strongly (i.e., at least 5- 

30 fold, 10-fold, 100-fold or 1000-fold) than others. 

Dissociation constants as low as 1 nM, 1 pM or 1 fM are 
possible for protein-RNA binding. 

A DNA segment is operably linked when placed into a 
functional relationship with another DNA segment. For 

35 example, a promoter is operably linked to a coding 
sequence if it stimulates the transcription of the 
sequence. Generally, DNA sequences that are operably 



I 

wo 98/44147 



PCr/US98y05740 



4 

linked are contiguous, and in the case of two amino acid 
coding sequences, both contiguous and in reading phase. 
Linking is accomplished by ligation at convenient 
restriction sites or at adapters or linkers inserted in 
5 lieu thereof . 

The term nucleic acid include RNA, DNA and peptide 
nucleic acids, single -stranded or double- stranded. 

Peptide or polypeptide refers to a polymer in which 
the monomers typically are alpha- (L) -amino acids joined 

10 together through amide bonds. Peptides are at least two 
and usually three or more amino acid monomers long. . The 
term protein is used to refer to a full-length natural 
polypeptide or a synthetic polypeptide that is 
sufficiently long to have a self-sustaining secondary 

15 structure (e.g., or-helix or p-pleated sheet) and at least 
one functional domain. 

Random peptide refers to an oligomer composed of two 
or more amino acid monomers and constructed by a means 
with which one does not entirely preselect the complete 

20 sequence of a particular oligomer. 

A random peptide library refers not only to a set of 
recombinant DNA vectors (also called recombinants) that 
encodes a set of random peptides, but also to the set of 
random peptides encoded by those vectors, as well as the 

25 set of fusion proteins containing those random peptides. 
Random peptide libraries frequently contain as many as 
10^ to lO''^ different compounds. 

The lefthand direction of a polypeptide is the amino 
terminal direction and the righthand direction is the 

30 carboxy- terminal direction, in accordance with 

convention. Similarly, unless specified otherwise, the 
lefthand end of single- stranded polynucleotide secjuences 
is the 5' end; the lefthand direction of double -stranded 
polynucleotide sequences is referred to as the 5' 

35 direction. The direction of 5' to 3' addition of nascent 
RNA transcripts is referred to as the transcription 
direction; sequence regions on the DNA strand which are 
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5' the RNA transcript are referred to as "upstream 
sequences"; sequence regions on the DNA strand which are 
3' to the RNA transcript are referred to as "downstream 
sequences. " 

5 A variant of a natural polypeptide usually exhibits 

at least 2 0%, and more usually at least 50%, sequence 
similarity to the natural polypeptide. The term sequence 
similarity means peptides have identical or similar amino 
acids (i.e., conservative substitutions) at corresponding 

10 positions. 

The polypeptides of the present invention are 
obtained in a substantially pure form, typically being at 
least 50% weight /weight (w/w) or higher purity, and being 
substantially free of interfering proteins and 

15 contaminants, such as those which may result from 

expression in cultured cells. Preferably, the peptides 
are purified to at least 80% w/w purity, more preferably 
to at least 95% w/w purity. For use in pharmaceutical 
conpositions, the polypeptide purity should be very high, 

20 typically being at least 99% w/w purity, and preferably 
being higher. 

SUMMARY OF THE INVENTION 
The invention provides methods of screening a 

25 library of nucleic acid fragments in eucaryotic cells. 
The library can be a natural library or a combinatorial 
library. In some methods, the library of nucleic acid 
fragment is transformed into primary cells, which are 
procaryotic or f\mgi. The primary cells are cultured 

30 under conditions in which the copy number of nucleic acid 
fragments is amplified to an average of at least 2 00 
copies per transformed cell. The transformed primary 
cells are contacted with a population of eucaryotic cells 
under conditions in which out ersur faces of the 

35 transformed primary cells and eucaryotic cells fuse and 
contents of the transformed primary cells including at 
least some of the library of nucleic acid fragments are 
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transferred to the eucaryotic cells. The nucleic acid 
fragments are 

screened in the eucaryotic cells to isolate one or more 
eucaryotic cells having a desired property conferred by 
5 one or more members of the library of nucleic acid 

fragments, an expression product thereof, or a secondary 
metabolite of an expression product . The one or more 
eucaryotic cells thus isolated are lysed releasing the 
one or more members of the library of nucleic acids 

10 conferring the desired property and these nucleic acids 
are electroporated into fresh procaryotic cells. The 
fresh cells are then propagated to amplify the one or 
more members of the library of nucleic acids, which 
confer the desired property. Typically, a nucleic acid 

15 fragment is isolated from the cells. Optionally, the 

methods can be performed in a cyclic fashion in which the 
amplified procaryotic cells of one cycle form the 
transformed primary cells in the next cycle. 

In some methods, the primary cells are E. coli, the 

20 nucleic acid fragments are contained in a ColEl vector, 
and the primary cells are cultured in the presence of an 
antibiotic to amplify the copy number of the library of 
nucleic acid fragments. In some methods, the eucaryotic 
cells lack capacity for episomal replication of the 

25 transferred nucleic acid fragments. In some methods, the 
nucleic acid fragments encode different peptides, and one 
or more of the peptides confers the desired property in 
the eucaryotic cells. In some methods, the nucleic acid 
fragments encode enzymes, which produce secondary 

30 metabolites in the procaryotic cells, which are 

transferred together with at least some of the nucleic 
acid fragments to the eucaryotic cells, and one or more 
of the secondary metabolites confers the desired property 
in the eucaryotic cells. 

35 The nature of the screen depends on the desired 

property. In some methods, in the screening step, the 
eucaryotic cells contain a construct encoding a reporter 
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enzyme operably linked to a regulatory sequence. 
Peptides can then confer the desired property by binding 
to the regulatory sequence or a trainscript thereof 
inducing expression of the reporter enzyme. The reporter 
5 construct can be introduced into eucaryotic cells with 
the library of nucleic acid fragments, or can be 
transferred sequentially. 

Some methods are used to screen expression products 
of a library of nucleic acid fragments or secondary 

10 metabolites thereof that are expressed in procaryotic 

cells or fungi. In these methods, the library of nucleic 
acid fragments is transformed into primary cells, which 
are procaryotic cells or fungi. The cells are cultured 
under conditions in which expression products and/or 

15 secondary metabolites of expression products are 

produced. The transformed cells are contacted with a 
population of eucaryotic cells under conditions whereby 
outersurf aces of the transformed procaryotic and 
eucaryotic cells fuse and contents of the transformed 

20 procaryotic cells including at least some of the library 
of nucleic acid fragments, expression products thereof 
and/or secondary metabolites of expression products are 
transferred to the eucaryotic cells. At least some 
eucaryotic cells receive an expression product and/or a 

25 secondary metabolite thereof and a nucleic acid fragment 
encoding the expression product. The eucaryotic cells 
are screened to isolate one or more eucaryotic cells 
having a desired property conferred by one or more of the 
expression products or one or more of the secondary 

30 metabolites produced in the primary cells. The one or 
more members of the library of nucleic acids ar 
transferred from the one or more eucaryotic cells into 
fresh procaryotic cells. The transformed fresh 
procaryotic cells are propagated to amplify the one or 

35 more members of the library of nucleic acids, which 

produce the one or more expression products and/or one or 
more secondary metabolites that confer the desired 
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property. 

Some methods are used to screen peptides for 
capacity to bind a selected RNA in a eucaryotic cells. 
Such methods entail introducing a library of nucleic 
5 acids encoding fusion proteins into a population of 
eucaryotic cells. Such a fusion protein comprises a 
peptide linked to a transcriptional inducer, the peptides 
varying between fusion proteins. The eucaryotic cells 
further comprise a construct encoding a reporter gene 
10 operably linked to a promoter from which expression is 
stimulated by the transcriptional inducer and an RNA 
binding site. The one or more fusion proteins, each 
comprise a peptide having specific affinity for the RNA 
binding site bind to the RNA binding site of the reporter- 
IB construct or a transcript thereof via the peptide, and 
the transcriptional inducer linked to the peptide 
stimulates expression of the reporter gene from the 
promoter. One or more eucaryotic cells with stimulated 
expression of the reporter gene are isolated. These 
20 cells contain one or more nucleic acid fragments encoding 
the one or more fusion proteins comprising a peptide 
having specific affinity for the RNA binding site. In . 
some such methods, the transcriptional inducer is a HIV 
TAT polypeptide and the promoter is a HIV LTR promoter. 

25 

BRIEF DESCRIPTION OF THE FIGURES 
Figure 1. Strategy for screening RNA-binding 
libraries- Libraries are fused to the activation domain 
of HIV-1 Tat (amino acids 1-48) or to full-length Tat and 

30 are delivered into stable cells containing an appropriate 
GFP reporter by protoplast fusion. GFP-expressing cells 
are isolated by FACS sorting, and plasmids are extracted 
by alkaline lysis and electroporated into bacteria. 
Protoplasts are made from the enriched population and the 

35 cycle is repeated until a large proportion of fused cells 
express GFP. Individual clones are tested for activity 
and positives are sequenced. 
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Figure 2. Activation of GFP expression by Tat 
fusion proteins and corresponding RNA reporters. Cells 
were lipofected with HIV TAR, RRE IIB, BIV TAR, or Ul 
hpll GFP reporters alone (left column) , along with 
5 Tatl-48 or Tatl-72 (middle column) , or along with 

full-length HIV Tat or Tat fusions to a Rev peptide, BIV 
Tat peptide, or UlA RNA-binding domain, respectively 
(left column) . Rev and BIV Tat peptides were fused to 
Tat 1-48 whereas the UlA domain was fused to Tat 1-72 to 
10 ensure nuclear localization. Plots show relative GFP 
fluorescence on the y-axis and relative side scatter (a 
measure of cell granularity) on the x-axis for 10,000 
cells . 

15 Figure 3 . Delivering plasmids by protoplast fusion 

activates GFP expression. (A) FACS analysis of cells 
containing a stably- integrated HIV-1 LTR-GFP reporter 
(reporter alone) or reporter cells fused with protoplasts 
containing pSV2Tatl-48 or pSV2Tatl-72 plasmids, or fused 

20 with protoplast mixtures containing pSV2Tatl-72 and 
pSV2Tatl-48 in 1:1 (50%), 1:10 (10%), or 1:100 (1%) 
ratios. Percentages refer to the proportion of 
pSV2Tatl-72 in the mixture. (B) Plot of the percentage 
of GFP- expressing cells as a function of the proportion 

25 of pSV2Tatl-72 in the mixture, from (A) . Based on the 
percentage of positive cells obtained with pSV2Tatl-72 
alone, about 10% of cells fused in this e3q)eriment. 

Figure 4. Plasmid recovery from FACS-sorted 
30 GFP-positive cells. HIV-1 LTR-GFP reporter cells were 
fused with protoplasts expressing Tatl-72 and positive 
cells were sorted. Plasmids were extracted by 
alkaline-lysis and phenol -extraction from 50-5000 
positive cells, with 20,000 HeLa cells added as carrier, 
35 and were electroporated into DH-5a cells. The number of 
colonies produced was plotted against the number of 
positive cells used for each plasmid preparation. 
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Figure 5. Enrichment of positive clones in a mock 
library. HIV-1 LTR-GFP reporter cells were fused with a 
protoplast mixture containing pSV2Tatl-72 and pSV2Tatl-48 
in a 1:10^ ratio (Round 1) . Fusions with each plasmid 
5 separately (top) were used to set sorting windows for 
positive and negative cells (boxes) . Protoplasts made 
from positive cells sorted in Round 1 were fused to 
reporter cells (Round 2), and protoplasts made from 
positive cells sorted in Round 2 were again fused to 
10 reporter cells (Round 3) . Individual clones from each 
round were analyzed by PGR, as described in the text. 

Figure 6. Screening an arginine-rich combinatorial 
library for RRE binders. (A) The genetic code viewed 

15 from the perspective of arginine-rich peptides. Amino 
acids known to be important for specific RNA binding by 
HIV-1 Rev, HIV-1 Tat, BIV Tat, and k N peptides are 
indicated in bold (Harada et al . , 1996). A restricted 
genetic code (bold box) encodes all charged and 

20 hydrophilic residues, glycine, alanine, and proline, and 
contains all six arginine codons. Combinations of these 
amino acids in an arginine-rich context are expected to 
encode a variety of helical and nonhelical RNA-binding 
peptides. (B) A 14-amino acid Rev peptide (Revl4 

25 corresponds to residues 34-47 of Rev, with Trp45->Arg and 
Glu47->Arg substitutions, and specifically binds RRE IIB 
RNA) and 14 arginines were fused to the HIV-1 Tat 
activation domain (residues 1-49) in the context of 
surrounding alanines as shown. In the library, 4 

30 residues corresponding to non-arginine positions in Rev 
were randomized with the amino acids encoded by the bold 
box in (A) and are highlighted. Revl4 served as a 
positive control and Argl4 as a negative control. (C) 
FACS analysis of the reporter alone, cells fused to 

35 negative and positive control protoplasts used to set 
sorting windows (boxes) , and library- containing 
protoplasts carried through three rounds of sorting, as 
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described in Figure 5 and in the text. Individual clones 
from Round 3 were tested for activity and positive clones 
were sequenced. 



5 Figure 7. Sequences of selected RRE binders. 

Clones 1-6 were identified by selecting for high-level 
GFP expression and were found in a total of 35 ^ 
GFP-positive individual clones. Sequences in the bottom 
section were identified from 18 positive clones after 
10 four rounds of selection using a slightly lower sorting-- 
window . 



Figure 8. Activities of Tat fusion proteins on an 
HIV-1 LTR RRE IIB-CAT reporter. (A) The Tat fusion 

15 plasraids shown (1, 5, and 25 ng) were cotransf ected with 
the reporter (50 ng) and CAT activities were measured 
after 48 hr. Fold activation was calculated as the ratio 
of activities with and without the Tat -expression 
plasmids. (B) Activities of Tat fusions containing 

20 R6QR7, R6NR7, and the Revl4(N->Q) mutant, determined as 
in (A) . 

DETAILED DESCRIPTION 

I . General 

25 The invention provides new methods for 

screening libraries of peptides and other compounds for a 
desired property in eucaryotic cells. The methods are 
premised, in part, on the unexpected observation that the 
contents of procaryotic or lower eucaryotic cells, such 

30 as yeast, can be transferred to recipient eucaryotic 
cells in an essentially clonal manner by protoplast 
fusion of the respective cells. That is, when a library 
of protoplasts of procaryotic or lower eucaryotic cells 
are fused with recipient eucaryotic cells, transfer 

35 occurs by a mechanism in which most transf ected recipient 
cells (e.g., at least 50 or 75%) have received the 
contents of only a single protoplast. The essentially 
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clonal transfer pertains even when protoplasts are 
present in large excess over recipient cells, as is 
necessary to maximize the proportion of recipient cells 
undergoing transfection. 
5 This result may occur because fusion is a low 

frequency process so that on average few protoplasts 
actually fuse, and/ or because delivery of many plasmids 
from one protoplast quickly saturates the cellular DNA 
uptake system and prevents entry of additional plasmids. 

10 Whatever the mechanism, near-clonal delivery is 

advantageous for library screening because: (1) plasmids 
conferring a desired property are transferred to 
recipient cells in homogenous form and can be isolated 
from such cells without laborious procedures such as 

15 siibdividing active pools, (2) homogenous transfer also 
allows for detection of weakly active members that might 
be missed if recipient cells received a mixture of 
plasmids, and (3) false positives and negatives caused by 
nonactive members delivered to the same cell as an active 

20 member of a library are reduced. 

The feasibility of clonal transfer can be 
exploited in several methods for screening libraries of 
nucleic acids, peptides or other compounds in higher 
eucaryotic cells. Such methods begin by transforming a 

25 library of nucleic acid fragments into primary cells, 
usually procaryotic. The transformed cells are then 
usually cultured under conditions that allow 
amplification of the molecule sought to be screened. 
Preferably, the molecule is amplified to at least 100, 

30 200, 500 or 1000 copies per cell. For example, nucleic 
acids cloned in ColEl vectors can be amplified to a copy 
number of over 1000 per cell by treating the cell culture 
with an antibiotic that inhibits protein synthesis. 
Peptides encoded by nucleic acids or secondary 

35 metabolites of such peptides can be anplified by 
culturing the cells under appropriate nutritional 
conditions. After amplification, protoplasts are formed 
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from primary cells and the protoplasts are contacted with 
recipient eucaryotic cells under conditions in which the 
cells fuse and the contents of the primary cells are 
transferred to the recipient ceils. 
5 As noted, transfer occurs in an essentially 

clonal fashion such that most transfected cells receive 
the contents of a single protoplast. A transfected 
eucaryotic cell thus typically receives multiple copies 
of a member of the nucleic acid fragment library together 

10 with an expression product of the nucleic acid fragment, 
and in some instances, a secondary metabolite of the 
e^qpression product. Transfected eucaryotic cells are 
allowed to recover from the fusion protocol, and are then 
screened for a desired property conferred directly or 

15 indirectly by a nucleic acid library member. Properties 
are conferred indirectly if they are conferred by an 
expression product of a nucleic acid library member or a 
secondary metabolite of an expression product. 

Eucaryotic cells having the desired property 

20 are isolated from other cells, for example, by FACS 

sorting. Nucleic acids are recovered from the isolated 
cells and transferred, preferably by lysis and 
electroporation, to further procaryotic cells. The cells 
are then cultured to amplify the nucleic acids conferring 

25 the desired property. Preferably, the method is 
performed in a cyclic fashion with the transformed 
procaryotic cells containing nucleic acid conferring the 
desired property being used to form protoplasts for 
fusion with further eucaryotic cells. Eventually, 

30 nucleic acid fragments conferring the desired property, 
their expression products or secondary metabolites are 
characterized, e.g., by sequencing an isolated nucleic 
acid fragment. 

The methods described above have general 

35 applicability for efficient screening of libraries in 

higher eucaryotic cells. However, some applications are 
particularly noteworthy. For example, the methods can be 
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used to screen nucleic acid libraries in eucaryotic cell 
types that do not support episomal replication of 
transferred nucleic acid, Episomal replication is not 
needed because nucleic acids can be amplified to high 
5 copy number in the primary cells before transfer to 
eucaryotic cells occurs. Thus, transfected eucaryotic 
cells receive sufficient copies of a nucleic acid library 
member to allow screening and recovering after screening 
to be effected without amplification of the copy number 

10 of the library member in the eucaryotic cells. 

Another application of the methods resides in 
the screening of polypeptides or secondary metabolites 
that are synthesized in the primary cells before transfer 
to the eucaryotic cell. As previously noted, expression 

15 products and/or secondary metabolites of a nucleic acid 
library member can be transferred to eucaryotic cells 
together with corresponding nucleic acid members encoding 
the expression products. Thus, protoplast transfer 
effectively preserves genetic linkage during screening 

20 between a nucleic acid and peptide encoded by it or a 

secondary metabolite thereof, in a manner analogous to a 
phage-display system. Transferred peptides or secondary 
metabolites are screened for a desired property in the 
eucaryotic cells. Nucleic acid members are then 

25 recovered from cells having the desired property, and the 
identity of peptides or secondary metabolites having the 
desired property can be determined indirectly, by 
sequencing the nucleic acid or allowing the nucleic acid 
to be expressed to produce a secondary metabolite, which 

30 can then be characterized by conventional methods. 

A further application of the above methods 
resides in screening libraries of peptides for capacity 
to bind to a selected RNA target in eucaryotic cells. 
For example, such methods can be used to identify 

35 peptides that bind tightly to an RNA sequence of an RNA 
virus and thereby inhibit replication or expression of 
the virus. Peptides are screened linked to a 
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transcriptional inducer as a fusion protein. Peptide 
having activity for a selected RNA binding site, bind to 
a reporter construct containing that site and thereby 
allow the transcriptional inducer to stimulate expression 
5 of a reporter gene on the reporter construct. 

II. Cell Tvpes Usable in the Methods 

The primary cells used for protoplast fusion 
in the above methods are usually procaryotic but can also 

10 be from lower eucaryotes such as yeast. The cells should 
be transformable at high frequency and capable of forming 
protoplasts. The cells should preferably also be capable 
of supporting a high level of expression of the molecular 
species sought to be screened in mammalian cells. 

15 Typically, the cells types are those commonly used in 
genetic engineering such as Bacillus, Escherichia coli, 
Pseudomonas , Salmonella, actinomycetes , and yeast. 

Eucaryotic cells suitable for screening are 
usually cell types that can be grown in tissue culture 

20 such as mammalian or plant cells. Suitable cells include 
those f rom , e.g., mouse , rat , hamster , primate , and 
human, both cell lines and primary cultures. Such cells 
include stem cells, including embryonic stem cells and 
hemopoietic stem cells, zygotes, fibroblasts, 

25 lymphocytes, Chinese hamster ovary (CHO) , mouse 

fibroblasts (NIH3T3) , kidney, liver, muscle, and skin 
cells. Other eucaryotic cells of interest include plant 
cells, such as maize, rice, wheat, cotton, soybean, 
sugarcane , tobacco . 

30 

III. Libraries 

The libraries of nucleic acids screenable by 
the methods can be natural cDNA or genomic libraries or 
can be combinatorial libraries in which one or more 
35 positions in library members is varied in a systematic 
manner between library members. Some libraries have 
members which encode short random peptides about 6-25 
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amino acids long. Other libraries encode variant forms 
of a naturally occurring protein. 

In some methods, library members encode one or 
more enzymes which alone, or together with other cellular 
5 enzymes, catalyze the production of a secondary 

metabolite. In this situation, library members encode 
variant forms of an enzyme or cluster of enzymes, and 
variation between the enzymes or clusters of enzymes in 
different library members causes variation in secondary 

10 metabolites produced in cells containing the enzymes. 
Secondary metabolites include antibiotics, polyketides, 
isoprenoids, vitamins, dyes, non-ribsomomally- synthesized 
peptides, enzymically modified peptides, and amino acids, 
strategies for mutation of genes encoding enzymes 

15 involved in secondary metabolite production are described 
by Hutchinson, Bio/Technology 12 , 375-308 (1994) ) • 
Sources of cloned genes encoding enzymes involved in 
antibiotic synthesis are reviewed by e.g., Piepersberg, 
Crit. Rev. Biotechnol. 14, 251-285 (1994); Chater, 

20 Bio/Technology 8, 115-121 (1990). Examples of cloned 
isoprenoid synthesis genes include trichodiene synthase 
from Fusarium sprorotrichioides , pentalene synthase from 
Streptomyces, aristolochene synthase from Penicillixmi 
roquefortii, and epi- aristolochene synthase from N. 

25 tabacum (Cane, in Genetics and Biochemistiry of Antibiotic 
Production (ed. Vining & Stuttard, Butterworth-Heinemann, 
1995), pp. 633-655. Production of secondary metabolites 
biodegradable plastic polyhydroxybutarate (PHB) , and the 
polysaccharide xanthan gum is reviewed by Cameron et al . , 

30 Applied Biochem. Biotech. 38, 105-140 (1993) . Genes 

encoding enzymes that catalyze the conversion of glucose 
to 2, 5-keto-gluconic acid, cuid that product to 2-keto-L- 
idonic acid, the precursor to L-ascorbic acid are 
reviewed by Boudrant, Enzyme Microh. Tecbnol, 12, 322-329 

35 (1990)). 

In other situations, library members encode 
RNA molecules which are to be screened in eucaryotic 
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cells. For example, the methods can be used to screen a 
library RNA molecules to identify those with affinity for 
a specific protein target. 

Typically, library members are cloned in a 
5 vector allowing episomal replication and/or expression of 
library members in the primary cells. In some methods, 
the vector contains a ColEl origin of replication 
allowing amplification of vector copy number to high 
levels (ca. 3000 copies per cell) by treatment with an 

10 antibiotic that inhibits protein synthesis. The vector 
may, but need not, contain a second origin of replication 
to allow subsequent episomal replication in higher 
eucaryotic cells. If such an origin is desired, an SV40 
origin allows episomal replication in COS cells, and an 

15 EBV origin together with a coding sequence for EBNA-1 
protein allows replication in a variety of cell types, 
but copy numbers are lower than with SV40 -based plasmids. 

Libraries are constructed by cloning an 
oligonucleotide which contains the variable region of 

20 library members (and any spacers and nonvariable 

framework determinants) into the selected cloning site. 
Using known recombinant DNA techniques (see generally, 
Sambrook et al.. Molecular Cloning, A LaboratojTir Manual, 
2d ed.. Cold Spring Harbor Laboratory Press, Cold Spring 

25 Harbor, N.Y. , 1989, incorporated by reference in its 
entirety for all purposes) , an oligonucleotide may be 
constructed which, inter alia, removes unwanted 
restriction sites and adds desired ones, reconstructs the 
correct portions of any sequences which have been removed 

30 (such as a correct signal peptidase site, for example) , 
inserts the spacer conserved or framework residues, if 
any, and corrects the translation frame (if necessary) to 
produce a fusion protein. A portion of the 
oligonucleotide will generally contain one or more 

35 variable region domain (s) and the spacer or framework 
residues. The sequences are ultimately expressed as 
peptides (with or without spacer or framework residues) . 
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The variable region domain of the oligonucleotide 
comprises the source of the library. 

The size of the library varies according to 
the number of variable codons, and hence the size of the 
5 peptides, which are desired. Generally the library will 
be at least about 10* or 10^ members, usually at least 
10^, and typically 10® or more members. For example, 
given the current efficiency of protoplast delivery 
(10-20% of cells receive library members) , the ability to 

10 recover about one procaiyotic colony per sorted 

eucaryotic cell, and a practical limit of FACS sorting 
(10^-10® cells) , it is possible to screen libraries of 
10^-10^ complexity with reasonable confidence, a level 
sufficient to clone low-abundance cDNAs. 

15 Nucleic acids are introduced into primary 

cells by standard methods depending on the cell type. 
Electroporation is preferred for generating large 
libraries in many bacterial cell types. 

20 IV. Protoplast Fusion 

Protoplast fusion is a method for transferring 
nucleic acids and other components from a procaryotic or 
lower eucaryotic cell (e.g., fungus) to a higher 
eucaryotic cell that occurs at high frequency. A 

25 protoplast results from the removal from a cell of its 
cell wall, leaving a membrane -bound cell that depends on 
an isotonic or hypertonic medium for maintaining its 
integrity. If the cell wall is partially removed, the 
resulting cell is strictly referred to as a spheroplast 

30 and if it is completely removed, as a protoplast. 

However, here the term protoplast includes spheroplast 
unless otherwise indicated. 

The method involved two steps: conversion 
of procaryotes or the lower eucaryote to protoplasts and 

35 the fusion of the protoplasts to the higher eucaryotic 

cells. Protoplast fusion was first described by Shaffner 
et al., Proc. Natl. Acad. Sex. USA 77, 2163 (1980) and 
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Other exemplary procedures are described by Yoakum et 
al., US 4,608,339, Takahashi et al., US 4,677,066 and 
Sambrooke et al . , supra at Ch. 16. The first step is 
typically effected by digestion of cell walls with 
5 lysozyme in a 10-20% sucrose, 50 mM EDTA buffer. 

Conversion of rod-shaped cells to spherical protoplasts 
can be monitored by phase-contrast microscopy. 
Protoplasts are then centrifuged onto a layer of 
eucaryotic cells, usually with the protoplasts in excess 

10 (up to about 100, 1000 or 10,000-fold). PEG is added to 
promote cell fusion. The transfected eucaryotic cells 
are sometimes propagated in antibiotic media to kill any 
bacterial cells surviving protoplast fusion, A method of 
cell fusion employing electric fields has also been 

15 described. See Chang US, 4,970,154. 

V. Screening in Higher Eucaryotic Cells 

After protoplast fusion, nucleic acid library 
members, their RNA or peptide expression products, or 

20 secondary metabolites are screened for a desired 

property. Examples of desired properties that can be 
screened for include encoding a cell surface antigen, a 
capacity to bind to a selected target, which can be a 
cellular protein or nucleic acid, a capacity to stimulate 

25 or inhibit a cellular process, toxicity to the recipient 
cell. The nature of the screen depends on the desired 
property. Libraries can be screened for a member 
encoding a cell surface protein of interest by screening 
cells with an antibody or other binding partner of the 

30 surface protein. Dead cells conferred by a toxic 
expression product of a transferred sequence can be 
distinguished by trypan blue exclusion. Other screens 
detect a new enzymic activity conferred by a library 
member or a capacity of the cells to grow without an 

35 otherwise essential nutrient. 

Screens for a specific binding affinity or for 
effect on a cellular metabolic process can often be 
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devised in which active library members cause enhanced . 
expression of a reporter gene from a reporter construct. 
The reporter gene can be any gene that confers a 
selectable or screenable property when it is expressed. 
5 Suitable reporter genes include the p-galactosidase gene, 
antibiotic resistance genes, such as CAT, and genes 
having a fluorescent expression product, such as the 
green fluorescent protein gene. 

An example of such a reporter system to screen 

10 peptides for RNA binding activity is discussed in detail 
below. A reporter system for screening peptide -peptide 
binding is described by Fearon et al., Proc. Natl. Acad. 
Sci. USA 89, 7958-7962 (1992). Peptide-peptide binding 
is identified by reconstitution of the functional 

15 activity of the yeast transcriptional activator GAL4 and 
the resultant transcription of a GAL4 -regulated reporter 
gene. Reconstitution of GAL4 function results from 
specific interaction between two chimeric proteins, one 
of which contains the DNA binding domain of GAL4, and the 

20 other contains a transcriptional activation domain. 
Transcription of the reporter gene occurs if the two 
chimeric proteins can forma complex that reconstitutes 
the DNA binding and transcriptional activation functions 
of GAL4. A reporter system for screening DNA-peptide 

25 binding interactions has been described by Li & 
Herskowitz, Science 262, 1870-4 (1993) . 

In screening methods employing a reporter 
construct, the construct can be introduced into the 
higher eucaryotic cells before protoplast fusion. The 

30 construct is typically integrated into the cellular 
genome as a stable cell line. Alternatively, the 
reporter construct can be introduced into recipient 
higher mammalian cells in the course of protoplast 
fusion. In this situation, the construct is first 

35 introduced into the primary cells from which protoplasts 
are made and is transferred to the higher eucaryotic 
cells with other components of the primary cells. 
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In general, transferred nucleic acid library 
members can be screened without substantial replication 
(e.g., no more than about a 2- or 5- fold increase in 
average copy number) of the library members in eucaryotic 
5 cell . Replication is not necessary because the 
essentially clonal nature of protoplast fusion can 
transfer sufficient copies of nucleic acid library 
members for screening without- replication. Also, it is 
not essential that library members be expressed in 
10 recipient higher eucaryotic cells, as expression products 
of the nucleic acids and/or secondary metabolites are 
transferred with the nucleic acids. For most secondary 
metabolites, and some peptides, expression is only 
possible in procaryotic cells. 

15 

VI . Recovery of Library Members from Selected Eucaryotic 
Cells 

Library members are recovered from eucaryotic 
cells surviving selection for transfer to further 

20 procaryotic cells. Library members can sometimes be 
recovered in small amounts from Hirt supematants of 
cells. However, higher yields are obtained if library 
members are released from eucaryotic cells by an alkaline 
lysis procedure. Library members are then transformed 

25 into procaryotic cells, usually, E. coli for 

amplification. Transformation is preferably by 
electroporation for highest efficiency transformation of 
what are often small amounts of DNA, particularly, if no 
replication was performed in the eucaryotic cells. The 

30 number of different library members recovered may range 
from 1, 10, 50, 500, to 10,000 or more. Usually, the 
library members are recovered from a population of cells 
surviving selection without clonal isolation of 
individual cells. 

35 Following electroporation, the procaryotic 

cells are cultured to amplify selected library members. 
Often the library members are subjected to further 
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round (s) of enrichment using the same principles as 
before. That is, the procaryotic cells bearing selected 
library members are used to make protoplast for another 
rounds of screening in eucaryotic cells, Procaryotic 
5 cells can also be clonally isolated before performing 
subsequent rounds of screening. Eventually, selected 
nucleic acids are subject to further characterization. 
When the nucleic acids encode peptides, further 
characterization entails sequencing the nucleic acid and 

10 then resynthesizing the peptide from the nucleic acid 
sequence, in similar fashion to phage display methods. 
The same analysis hold if the nucleic acids encode RNA 
molecules. In situations, where the nucleic acids encode 
secondary metabolites, sequencing of selected nucleic 

15 acids can be used to identify enzyme (s) that lead to 
production of a secondary metabolite having a desired 
activity. The secondary metabolite itself can be 
recovered from cells expressing these nucleic acids, and 
characterized directly by conventional chemical methods 

20 such as infrared or mass spectrophotometry. 
VII. Screening for RNA Binding Peptides 

1. RNA Binding Protein and Recognition 

Seguences 

Natural RNA binding proteins often have one 
25 domain responsible for RNA binding and other domains 

responsible for other functions. The domain responsible 
for RNA binding can sometimes be recognized by a 
characteristic motif. The most widely found RNA 
recognition sequence or binding motif is the RNP motif. 
30 The RNP motif is a 90-100 amino acid sequence that is 
present in one or more copies in proteins that bind pre 
mRNA, raRNA, pre-ribosomal RNA and snRNA. The consensus 
sequence and the sequences of several exemplary proteins 
containing the RNP motif are provided by Burd and 
35 Dreyfuss, supra. See also Swanson et al., Trends 

Biochem. Sci. 13, 86 (1988); Bandziulis et al., Genes 
Dev. 3,431 (1989); Kenan et al . , Trends Biochem. Sci. 16, 
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214 (1991) . The consensus motif contains two short 
consensus sequences RNP-1 and RNP-2. Some RNP proteins 
bind specific RNA sequences with high affinities 

(dissociation constant in the range of 10"®-10"''^ M) . Such 
5 proteins often function in RNA processing reactions. 
Other RNP proteins have less stringent sequence 
requirements and bind less strongly (dissociation 
constant -10"^-10"^ M) . Burd & Dreyfuss, EMBO J, 13, 1197 

(1994) . 

10 A second characteristic RNA binding motif 

found in viral, phage and ribosomal proteins is an 
arginine-rich motif (ARM) of about 10-20 amino acids. 
RNA binding proteins having this motif include the HIV 
Tat and Rev proteins. Rev binds with high affinity 

15 disassociation constant (10*^ M) to an RNA sequence termed 
RRE, which is found in all HIV mRNAs . Zapp et al • , 
Nature 342, 714 (1989); Dayton et al . , Science 246, 1625 
(1989) . Tat binds to an RNA sequence termed TAR with a 
dissociiation constant of 5 x 10"^ M. Churcher et al., J. 

20 Mol. Biol. 230, 90 (1993) . For Tat and Rev proteins, a 
fragment containing the arginine-rich motif binds as 
strongly as the intact protein. In other RNA binding 
proteins with ARM motifs, residues outside the ARM also 
contribute to binding. Other families of RNA binding 

25 proteins with different binding motifs are described by 
Burd and Dreyfuss, supra. 

2 ■ Screening methods 

The invention provides methods of screening 
30 for RNA binding proteins that have desirable binding 

characteristics to selected RNA sequences. The methods 
can be used to isolate variants of known RNA binding 
proteins having altered (usually strengthened) binding 
characteristics. The methods are also useful for 
35 isolating hitherto unknown RNA binding proteins to any 
RNA sequence of interest. The unknown RNA binding 
proteins may be natural proteins encoded by cDNA or 
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genomic libraries or synthetic peptides selected from a 
random combinatorial library. The methods can also be 
applied to screening a library of RNA recognition 
sequences to an RNA binding protein of interest. 
5 There are two components of the screening 

system. The first component is a library encoding fusion 
proteins, each of which has at least two moieties. The 
first moiety is a peptide to be screened which varies 
between library members. The second moiety is a 

10 transcriptional inducer, which is the same in different 
library members. The transcriptional inducer is a 
peptide capable of inducing transcription from a promoter 
by binding to an RNA or DNA site proximal to the 
promoter. Binding brings the transcriptional -inducing 

15 domain of the fusion protein into proximity with the 

promoter and polymerase bound thereto thereby stimulating 
expression of a gene linked to the promoter. Usually, 
the natural RNA binding domain of a transcriptional 
inducer is deleted from fusion proteins, since its 

20 function is effectively replaced by the peptides being 
screened. 

In some arrangements, the fusion proteins 
further comprise a linker or spacer polypeptide. The 
linker is usually be inserted between the peptide being 

25 screened and the transcriptional inducer. A linker (or 
spacer) refers to a molecule or group of molecules that 
connects two molecules or two parts of a single molecule. 
A linker serves to place the two molecules in a preferred 
configuration, e.g., so that each domain is fmictional 

30 without steric hindrance from the other. The spacer can 
be as short as one residue or as many as five to ten to 
up to about 100 residues. The spacer residues may be 
somewhat flexible, comprising polyglycine, or (GlyjSer)^ 
for example. Alternatively, rigid spacers can be formed 

35 predominantly from Pro and Gly residues. Hydrophilic 

spacers, made up of charged and/or uncharged hydrophilic 
amino acids (e.g., Thr, His, Asn, Gin, Arg, Glu, Asp, 



wo 98/44147 



PCTAJS98/05740 



25 

Met , Lys , etc . ) , or hydrophobic spacers made up of 
hydrophobic amino acids (e.g., Phe, Leu, lie, Gly, Val, 
Ala) can be used to present the RNA binding site with a 
variety of local environments. 

The second component of the reporter system is 
a reporter construct . The reporter construct encodes a 
reporter gene (such as the GFP gene, p-galactosidase or 
chloramphenicol acetyl transferase) operably linked to a 
promoter and an RNA binding site. The choice of the 
promoter depends on the transcriptional inducer; that is 
transcription from the promoter should be capable of 
being stimulated by the transcriptional inducer. The RNA 
binding site can be the natural site recognized by a 
natural RNA binding domain of the transcriptional inducer 
or can be a heterologous site. If the latter, the site 
can be an RNA site for a known RNA binding protein, or an 
RNA site for which no known RNA binding protein is known 
but novel RNA binding peptides are sought. The RNA 
binding site is usually positioned between the promoter 
and the reporter coding sequence. 

Both library members and the reporter 
construct are introduced into eucaryotic cells for the 
screen. The library members and reporter construct can 
be introduced concurrently or sequentially. Introduction 
of library members and/or reporter construct can be 
effected by the protoplast fusion method described above. 

The screen works as follows. Most of the 
eucaryotic cells receive a nucleic acid library member 
encoding a fusion protein in which the peptide moiety 
lacks specific affinity for the RNA binding site on the 
construct or a transcript thereof. In these cells, the 
transcriptional inducer moiety is not brought into 
proximity with the promoter on the reporter construct and 
the reporter enzyme is es^ressed at basal levels. In one 
or a few eucaryotic cells, the cell receives a nucleic 
acid fragment encoding a fusion protein in which the 
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peptide moiety has specific affinity for the RNA binding 
site in the reporter construct or a transcript thereof^ 
In such a cell, the fusion protein binds to the RNA 
binding site on the construct or a transcript thereof via 
5 the peptide moiety, bringing the transcriptional inducer 
in proximity with the promoter on the reporter construct. 
The transcriptional inducer thereby stimulates expression 
of the reporter gene from the promoter. If the reporter 
gene is GFP, cells showing increased expression of the 
10 reporter can readily be identified by FACS sorting. The 
FACS method can screen large numbers of cells in liquid 
culture. A FACS machine can be programmed to isolate a 
fractionate of cells whose fluorescence exceeds a desired 
limit . 

15 A preferred transcriptional inducer is HIV-1 

TAT. Tat is a potent activator of HIV-1 gene 
transcription and is essential for viral replication. 
Tat activates transcription by enhancing the processivity 
of RNA polymerase II transcription complexes initiated at 

20 the HIV-1 LTR (Kao et al . , 1987; Feinberg et al., 1991 [ 
Marciniak et al . , 1991; Kato et al., 1992; Laspia et al . , 
1993) , possibly by recruiting or enhancing the activity 
of cellular kinases that phosphorylate the CTD of RNA pol 
II, thereby creating elongation-competent polymerases 

25 (Parad et al., 1996; Yang et al., 1996; Zhou et al., 

1996) . To fimction. Tat must bind to TAR, an RNA hairpin 
located at the 5' end of nascent transcripts (Rosen et 
al., 1985; Roy et al., 1990). Tat contains a 
functionally defined activation domain (amino acids 1-48) 

30 and an arginine-rich RNA-binding domain (amino acids 
49-57) that also functions as a nuclear localization 
signal (Dang et al., 1989; Hauber et al., 1989; Ruben et 
al., 1989) . The activation and RNA-binding domains are 
modular and separable. Tat can activate transcription 

35 when bound to the nascent transcript through heterologous 
RNA-protein interactions (Selby et al., 1990; Southgate 
et al., 1990) or even when bound to DNA (Southgate et 
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al., 1991). Thus, by assembling a reporter construct in 
which TAR is replaced with a "bait" RNA, it is possible 
to screen a library of fusion proteins comprising 
peptides linked to a Tat polypeptide containing at least 
5 the activation domain of Tat. For example, the bait RNA 
can be HIV RRE with a view to identifying peptides that 
antagonize binding of REV to HIV RRE and thereby abort 
the HIV infective cycle. 

An analogous screen can be used to isolate RNA 

10 sequences that have specific binding affinity for a 
selected peptide. In this situation, a nucleic acid 
library is designed containing variants of a first 
construct. In each member of the library, the first 
construct contains a reporter coding sequence in operable 

15 linkage with a promoter, as described before. Between 
the coding sequence and promoter, there is a segment 
encoding potential RNA binding sequences that varies 
between members of the library. A second constoruct 
encodes a fusion protein con5)rising a transcriptional 

20 inducing domain linked to a selected peptide for which 
RNA binding sites are to be identified. 

The libraiy members are introduced into 
eucaryotic cells via protoplast fusion with primary 
cells, as described before. Preferably, the library 

25 members are transcribed in the primary cells so that 
transcripts of the first construct differing in the 
potential RNA binding site region are introduced into the 
eucaryotic cells together with the corresponding first 
constructs. The second construct can be transferred to 

30 eucaryotic cells from the primary cells concurrently with 
the library members. Preferably, the second construct is 
also expressed in the procaryotic cells so multiple 
copies of fusion protein are transferred to the 
eucaryotic cells. Alternatively, the library members and 

35 second construct can be introduced into eucaryotic cells 
sequentially. 
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The screen works in much the same manner as 
described above. That is, in eucaryotic cells having 
received a library members with an RNA binding site with 
specific affinity for the peptide moiety of the fusion 
5 protein, the fusion protein binds to the library member 
or a transcript thereof, inducing expression of the 
reporter gene in excess of basal levels. In cells having 
received a library member lacking an RNA binding site 
with specific affinity for the peptide moiety, the fusion 
10 protein does not bind to the library member or its 

transcript and the reporter is expressed at only basal 
levels. 

RNA binding polypeptides isolated by the 
methods described above have a variety of uses. In one 

15 application, RNA binding polypeptides are used in 

therapeutic methods to block the life-cycle of pathogenic 
microorganisms, including viruses, such as HIV, and 
bacteria. Some synthetic RNA binding polypeptides are 
used as antagonists of a naturally occurring RNA binding 

20 protein. A synthetic polypeptide occupies the target 
site in competition with the natural protein or RNA 
without fulfilling the physiological role of the natural 
protein. The synthetic polypeptide thereby antagonizes 
the natural protein and aborts the life-cycle of a 

25 pathogenic microorganism. In such methods, the synthetic 
RNA binding polypeptide preferably has a higher binding 
affinity than the natural protein, and lacks functional 
domains (other than the binding domain) present in the 
natural protein. Other RNA binding polypeptides bind 

30 unique sequences on the pathogen's mRNA for which there 
may be no naturally occurring RNA binding protein. These 
polypeptide interfere with replication or translation of 
the pathogenic microorganism. For example, the RNA 
binding protein can occlude the Shine -Delgarno sequence 

35 or initiation codon of a bacterial mRNA thereby 

preventing translation. RNA binding sites recovered by 
the claimed methods can be used in an analogous manner to 
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antagonize the binding of natural RNA sequences. In 
mantmalian diseases resulting from impainnent or loss of a 
natural RNA binding protein, treatment with an exogenous 
RNA binding protein or an analog that substitutes for, or 
5 agonizes a natural protein serves to ameliorate the 
disease. Some of these synthetic polypeptides possess 
both an RNA binding protein and a fxinctional domain also 
present in the naturally occurring protein. 

10 VIII. Analogs 

Binding-peptides isolated by the methods can 
serve as lead compounds for the development of derivative 
compounds. The derivative compounds can include chemical 
modifications of amino acids or replace amino acids with 

15 chemical structures. The analogs should have a 
stabilized electronic configuration and molecular 
conformation that allows key functional groups to be 
presented to the binding site in substantially the same 
way as the lead peptide. In particular, the non-peptidic 

20 compounds have spatial electronic properties which are 
comparable to the polypeptide binding region, but are 
typically much smaller molecules than the polypeptides, 
frequently having a molecular weight below about 2 kD and 
preferably below about 1 kD. 

25 Identification of such non-peptidic compounds 

can be performed through use of techniques known to those 
working in the area of drug design. Such techniques 
include, but are not limited to, self -consistent field 
(SCF) analysis, configuration interaction (CI) analysis, 

30 and normal mode dynamics analysis. Computer programs for 
implementing these techniques are readily available. See 
Rein et al., Computer-Assisted Modeling of Receptor- 
Li grand Interactions (Alan Liss, New York, 1989) . 



35 



IX. Diagnostic and Therapeutic Compositions 

Peptides and other compounds identified by the 
above methods or their analogs are formulated for 
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therapeutic use as pharmaceutical compositions. The 
compositions may also include, depending on the 
formulation desired, pharmaceutically-acceptable, non- 
toxic carriers or diluents, which are defined as vehicles 
5 commonly used to formulate pharmaceutical compositions 
for animal or human administration. The diluent is 
selected so as not to affect the biological activity of 
the combination. Examples of such diluents are distilled 
water, physiological saline. Ringer's solutions, dextrose 

10 solution, and Hank's solution. In addition, the 
pharmaceutical composition or formulation may also 
include other carriers, adjuvants, or nontoxic, 
nontherapeutic, noniramunogenic stabilizers and the like. 

The polypeptides and compounds isolated by the 

15 methods are also useful in diagnostic methods. For 
example, an RNA binding polypeptide with a specific 
affinity for an RNA sequence encoded by a pathogenic 
microorganism can be used to detect the microorganism. 
In one assay format, the polypeptide is immobilized to a 

20 support, optionally via a linker, and a sample, which may 
or may not contain RNA from the microorganism, is 
contacted with the support. Bindings of the RNA from the 
microorganism to the support can be detected by 
competition with binding of a labelled synthetic RNA 

25 recognition sequence to the immobilized RNA binding 

polypeptide. RNA binding polypeptides are also useful in 
controlling the growth of cells in culture. 

The following examples are provided to 
illustrate but not to limit the invention. 



30 
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EXAMPLES 
1 . Experimental Procedures 

a. Plasmid construction 

The HIV-1 LTR-GFP reporter plasmid (containing 
5 wild-type TAR) was constructed by inserting a gene 

encoding the Ala65 GFP mutant (containing a GCC alanine 
codon in place of the TCT serine codon) between the Hind 
III and Xho I sites in pCDNA3 (Invitrogen) , and by 
replacing the CMV promoter of pCDNAB with the HIV-1 LTR 

10 from pHIV-CAT (Rosen et al., 1985) . BUV TAR and RRE IIB 
GFP reporters were constructed by replacing the HIV-1 
LTR-TAR region with corresponding regions from BIV TAR 
and RRE IIB CAT reporters (Tan et al., 1993; Chen and 
Frankel, 1994) . The Ul GFP reporter was constructed by 

15 replacing the top part of TAR (+20 to +40) with an 

oligonucleotide containing the 22 -nucleotide Ul snRNA 
hairpin II (CXibridge et al., 1994). Tat-Rev^^, Tat-Arg^^, 
and selected Tat -peptide hybrids were constructed by 
cloning oligonucleotide cassettes encoding each peptide 

20 plus four alanines at the N- terminus and four alanines 
and an arginine at the C- terminus after amino acid 49 of 
Tat (using an Eag I site at the end of the activation 
domain) in vectors derived from pSV2tat72 (Frankel and 
Pabo, 1988) . The Tat-UIA fusion was constructed by 

25 fusing oligonucleotide cassettes encoding three glycines 
followed by residues 2-102 of UlA to Tat^.^ (kindly 
provided by S. Landt) . All constructs were confirmed by 
dideoxynucleotide sequencing. 

30 b. Stable Cell Lines 

HeLa cell lines containing stably integrated 
HIV-1 LTR-GFP reporters were selected using neomycin 
(G418) . Cells were transfected by lipofection with 
plasmids encoding each reporter and were grown in 

35 Dulbecco's modified Eagle medium (DMEM) containing 10% 
fetal bovine serum and 1 mg/ml neomycin. The medium was 
changed every four days. Following serial dilution in 
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96 -well plates and 3-4 weeks of growth, single colonies 
were chosen and tested for GFP expression after 
transfecting with plasmids expressing the corresponding 
Tat fusion. Clones with low backgrounds in the absence 
5 of the Tat fusion and bright fluorescence in the presence 
of the fusion, as judged by fluorescence microscopy, were 
selected for expansion. 

c. Protoplast Preparation and Fusion 

10 Single DH-5of colonies containing appropriate 

plasmids were inoculated into 5 ml LB medium containing 
100 /xg/ml ampicillin, and cells were grown overnight at 
37<>C with moderate shaking. Overnight cultures (0.5 ml) 
were added to 50 ml LB containing ampicillin, cells were 

15 grown at 37*^C to A^^ = 0.7-0.8, chloramphenicol was added 
to 250 iig/ml, and plasmids were amplified by growing for 
an additional 16 hr at 37 Cells were centrifuged at 
2000 xg for 10 min at 4<>C, cell pellets were resuspended 
10 ml 50 mM Tris-Cl (pH 8.0) and centrifuged again at 

20 2000 xg for 10 min at 4<*C. Protoplasts were prepared 
using a previously described protocol (Sandri-Goldin et 
al., 1981) as follows: A Cells were resuspended in 2.5 ml 
chilled 20% sucrose in 50 mM Tris-Cl (ph 8.0), 0.5 ml 5 
mg/ml lysozyme (freshly prepared in 0.25 M Tris-Cl, pH 

25 8.0) was added, and cells were incubated on ice for 5 
min. One ml 0.25 M EDTA (pH 8,0) was added and cells 
were incubated on ice for an additional 5 min. One ml 50 
mM Tris-Cl (pH 8.0) was added slowly and the mixture was 
incubated at 37 ^C until all bacteria were converted to 

30 protoplasts, as monitored by phase -contrast microscopy 
(bacteria are rod-shaped whereas protoplasts are round) . 
For DH-5of cells, conversion to protoplasts takes about 15 
. min. Protoplasts were then carefully and slowly diluted 
with 20 ml room temperature serum- free DMEM containing 

35 10% sucrose and 10 mM MgCl^ and suspensions were kept at 
room temperature for 15 min. These preparations contain 
-1.5x10^ protoplasts per ml and are ready for fusion. 
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Prior to fusion, HeLa cells were split into 6- 
well plates and grown for 24 hr to -70% confluence. 
Medium was removed and cells were washed with 4 ml serum- 
free DMEM per well. Four ml of protoplast suspensions (- 
5 6x10^ protoplasts; protoplasts should be > 1000-fold 
excess over cells) was added and plates were configured 
at 1650 xg for 10 min at 25^C. Supematants were removed 
carefully by suction. Two ml pre-wamed 50% (v/v) 
PEGIOOO or 50% (W/V) peglSOO was added at room 

10 tenperature, left for 2 min. And removed by suction. 

Cells were washed three times using 2 ml serum- free DMEM, 
and 4 ml DMEM containing 10% fetal bovine serum, 
penicillin, streptomycin, and kanamycin was added. 
Medium was changed after 24 hr, and cells were grown for 

15 an additional 24 hr before examining fluorescence. 

d , FACS 

Protoplast -fused or transfected cells were 
harvested by trypsinization after 48 hours and were 

20 resuspended at a concentration of 10^ cells/ml in DMEM 

containing 10% cell dissociation buffer (GIBCO-BRL) , 0.3% 
fetal bovine serum, and 1 /xg/ml propidium iodide. 
Samples were analyzed or sorted by FACS using an argon 
laser to excite cells at 488 nm and a 530 ± 30 nm band 

25 pass filter to detect GFP emission. For FACS scans, 

10,000 cells were typically analyzed using a FACScan flow 
cytometer (Bectpn Dickinson, San Jose, CA) . For FACS 
sorting, 2000-3000 cells were sorted per second, 
propidium iodide -staining cells were removed 

30 electronically, and cells of interest were collected into 
Eppendorf tubes containing 0.5 ml DMEM. Sorting was 
performed using a Howard Hughes Medical Institute (UCSF) 
FACSTAR+ cell sorter (Becton Dickinson) . FACS data were 
analyzed using CellQuest software. 
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e. Plasmid Recovery from FACS-Sorted Cells 
FACS-sorted cells were mixed with 20,000 HeLa 

cells and centrifuged at 15000 xg for 5 min at 4®C. Cell 
pellets were resuspended in 10 /xl TE buffer (pH 8.0) 
5 containing 0.2 mg/ml tRNA, cells were lysed by adding 20 
^1 1% SDS, 0.2N NaOH, and suspensions were incubated on 
ice for 5 min. Fifteen /zl 3M NaOAc (pH 4.8) was added, 
mixtures were incxibated on ice for 10 min and then 
centrifuged at 15000 xg for 5 min at A^C. Supematants 

10 were transferred to fresh tubes and extracted with an 
equal volume of phenol : chloroform (1:1). Plasmid DNAs 
were precipitated by adding 1 /zl/ml glycogen (Sigma), 0.1 
vol 3M NaOAc (pH 4.8), and 3 vol ethanol (-200C) , 
incubating on dry ice for 1 hr, and centrifuging at 15000 

15 xg for 30 min. Pellets were washed with 70% ethanol (- 
20«>C) , air dried, and dissolved in 1 /zl distilled water. 

f . Preparation of Electrocompetent Cells and 
Elect roporat ion 

20 Single DH-5of colonies were inoculated into 5 

ml LB, cells were grown at 37 «C with moderate shaking 
overnight, and 2.5 ml was used to inoculate 500 ml LB 
cultures. Cells were grown at 37<>C with shaking to A^q = 
0.5-0.6 and then chilled in ice water for 15 min. Cells 

25 should be kept at 0°C for all subsequent steps. Cells 
were centrifuged in a swinging bucket rotor at 4000 xg 
for 20 min, pellets were resuspended in 500 ml ice-cold 1 
mM HEPES (pH 7.0), centrifuged at 4000 xg for 20 min, and 
the HEPES wash was repeated. Cells were then resuspended 

30 in 250 ml ice cold water, centrifuged at 4000 xg for 20 
min, and the water wash was repeated. These four washing 
steps are critical for obtaining highly electrocompetent 
cells and must be performed carefully to avoid loose cell 
pellets. Cells were then resuspended in 100 ml ice-cold 

35 glycerol, centrifuged at 4000 xg for 10 min, and 

resuspended in one half the cell volume of ice-cold 
glycerol. Cell densities are typically -3xlO^Vnil. If 
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frozen electrocompetent cells are desired, 100 fil 
aliquots can be frozen on dry ice and stored at -80«*C- 
Frozen cells generally give 2-3 fold fewer transformants 
than fresh cells. 
5 For electroporation, recovered plasmid DNAs (1 

(il) were added to 50 ^1 electrocompetent cells on ice, 
and cells were electroporated at 1.8 KV, 25 /xF, and 200 
ohms using 1 nun cuvettes (BioRad) . One ml room 
temperature SOC medium was immediately added to the 

10 cuvettes, cells were transferred to culture tubes, 

incubated with moderate shaking for 60 min at 37^C, and 
spread on DLB plates containing 100 /xg/ml ampicillin. 
Using small amounts of supercoiled pUC19 plasmid DNA 
(0.1-10 pg) , efficiencies of - 1.5x10" colonies//zg DNA 

15 were typically obtained. 

q. Combinatorial Peptide Library Design and 

Screening 

A degenerate oligonucleotide (5'- 

2 0 ATCTCTTACGGCCGTGCCGCT 

GCAGCCXXYAGTOCXYXXYAGGCGAXXYAGGAGACGGCGACGTCGCAGAGCTGCCGC^ 
GCAAGATGACTCGAGACTAGTGGA-3' , where X is A:G:C mixture at 
a 1:1:1 ratio and Y is ja G:T mixture at a 1:1 ratio) was 
synthesized encoding the arginine-rich peptide library. 

25 A primer (5' -TCCACTAGTCTCGAG-3' ) was annealed to the 

degenerate oligonucleotide, and double- stranded DNA was 
synthesized using Sequenase 2.0 (USB). The double- 
stranded product (-0.1 fig) was digested with Eag I and 
Xho I and ligated into 5 /xg Eag I-Xho I -digested 

30 pSV2rat72 to generate fusions to amino acid 49 of Tat. 
The encoded peptides contain four randomized positions 
within a stretch of fourteen arginines, 

AAA70CRXXRRXRRRRRRRAAAAR, where X represents any of twelve 
amino acids in the bold box in Figure 6A. Ligation 
35 products were phenol extracted, ethanol precipitated, and 
-0.2 iig was electroporated into DH-5a cells. Immediately 
following elect roporat ion, 7x10^ individual clones were 
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obtained, 700 -fold larger than the sequence complexity of 
the library. Cells were amplified to 2x1 o' by growing in 
LB for 3 hr, centrifuged, resuspended in 5 ml glycerol 
stock buffer (50% LB, 32.5% glycerol, 50 mM MgSO^, 12.5 mM 
Tris-Cl, pH 8.0), and stored at -80°C in 1 ml aliquots. 
Sequences of 18 individual colonies from the amplified 
cells indicated no sequence bias. 

One ml of the library stock (-10® cells) was 
inoculated into 100 ml LB/ampicillin, and cells were 
grown at 37«C to = 0.7-0. 8. Parallel 50 ml cultures 
containing Tat-Rev^^ and Tat-Arg,^ plasmids were grown. 
Chloramphenicol was added to 250 /xg/ml and cells were 
grown for an additional 16 hr. Protoplasts were prepared 
and fused to HeLa cells containing a stably integrated 
HIV-1 LTR RRE IIB-GFP reporter. After 48 hours, 10,000 
positive control cells (Tat-Rev^^) and 10,000 negative 
control cells (Tat-Arg^^) were analyzed by FACS to 
estimate fusion efficiency and to establish the sorting 
window. Library- fused cells ([JlO^) were sorted by FACS 
and positive cells were collected. Plasmids were 
recovered by alkaline-lysis phenol -extraction and 
electroporated into DH-5a cells. Resulting colonies were 
harvested from 5 plates using 5 ml LB per plate, cells 
were centrifuged, resuspended in 4 ml glycerol stock 
buffer, and 2 ml was used to inoculate 100 ml 
LB/ampicillin to prepare protoplasts for the next round 
of selection. The cycle was repeated for three rounds, 
until the fraction of GFP-positive cells was similar to 
that of the positive control . 

h. CAT Assays 

Levels of activation by the Tat fusion 
proteins were assessed by cot rans feet ing 50 ng of an HIV- 
1 LTR RRE IIB-CAT reporter plasmid (Tan, R. et al. 
(1993) . Rna recognition by an isolated alpha helix. Cell 
73:1031-1040) and 0.2-25 ng Tat expression plasmids into 
HeLa cells using lipofectin. Total plasmid DNA was 
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adjusted to 1 /ig with pUC19. CAT activities were assayed 
after 48 hr using an appropriate amount of cell extract 
as described (Calnan, B.J. et al. (1991). Analysis of 
arginine-rich peptides from the HIV Tat protein reveals 
5 unusual features of RNA-protein recognition. Genes Dev 
5:201-10), and activities 

2 . Results 

The basic protocol for screening RNA-binding 
10 libraries is outlined in Figure 1. The Tat activation 
domain, or in some cases full-length Tat, was fused to a 
library and an HIV-LTR GFP reporter was constructed in 
which an RNA site of interest replaced the TAR site. The 
library was delivered into reporter- containing HeLa cells 
15 by protoplast fusion under conditions in which 

approximately one bacterium fused to one cell . After two 
days, HeLa cells expressing high levels of GFP were 
sorted by FACS, and plasmids were extracted and 
electroporated into bacteria. The procedure was repeated 
20 using protoplasts from the enriched population until most 
cells expressed GFP, and resulting plasmids were 
sequenced. 

To test whether the Tat -GFP reporter system 
could be used to monitor specific RNA-protein 

25 interactions, we first constructed a set of reporters 
containing HIV TAR, RRE IIB (the high-affinity Rev 
binding site) , bovine immunodeficiency virus (BIV) TAR, 
and Ul snRNA hairpin II (the UlA protein binding site) , 
and a set of Tat fusions containing the RNA-binding 

30 domains from HIV Tat, Rev, BIV Tat, and the UlA protein 
(Calnan et al., 1991; tan et al., 1993; Chen et al . , 
Frankel, 1994; Oubridge et al., 1994; Harada et al., 
1996) . GFP reporters were transfected into HeLa cells by 
lipofection either alone or with Tat fusions, and 

35 expression was monitored by fluorescence microscopy and 
FACS. GFP expression was observed only with the cognate 
partners. The activation domain of Tat alone (Tatl-48) - 
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did not activate through HIV TAR, RRE, or BIV TAR 
reporters, and full-length Tat (Tatl-72) did not activate 
through Ul hpll (Figure 2) . UlA was fused to Tat 1-72 to 
ensure nuclear localization (the arginine-rich domain of 
5 Tat functions as an NLS) whereas the other RNA-binding 
domains were fused to Tat 1-48 because they provide their 
own NLS. When reporters were cotransf ected with the 
corresponding Tat fusion proteins, GFP expression 
increased -10 -100 -fold (Figure 2) . No activation was 

10 observed through noncognate RNAs, as monitored either by 
FACS or fluorescence microscopy, indicating that the 
Tat -GFP reporter system accurately reflects specific 
RNA-protein interactions. The Tat-UIA fusion, which 
contains the arginine-rich RNA-binding domain of Tat, 

15 also functioned through HIV TAR. 

We examined the properties of GFP variants 
having different fluorescence intensities in stable 
reporter -containing cell lines and by transient 
transf ection. The signals obtained with Ala65 and Thr65 

20 mutants (Cubitt et al., 1995; Cormack et al;, 1996) were 
significantly higher than wild- type GFP, and the signal 
from a GFP gene in which codons were optimized for 
mammalian expression (EGFP; Clontech; Haas et al., 1996) 
was even higher. However, because EGFP produced 

25 substantial fluorescence even in the absence of Tat, we 
chose to use the Ala65 mutant, which gives slightly 
brighter fluorescence than Thr65, for subsequent 
experiments . 
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a. Introduction of olasmids by protoplast • 

fusion 

To facilitate screening of relatively large 
libraries (-10^-10^ members) , an approach was designed to 
5 deliver many copies of a single library member into each 
recipient cell. In principle, clonal or near-clonal 
delivery allows relatively weak binders to produce 
detectable GFP signals and reduces the background from 
other members of the library, while delivery of a 

10 sufficient number of copies might allow plasmid recovery 
without additional amplification. Unlike bacteria or 
yeast where plasmid segregation results in clonal 
colonies, transfection of mammalian cells is believed to 
involve the uptake of a large population of plasmids 

15 [xx] . We reasoned that "pre-packaging" the plasmid DNA 
might allow delivery of a more homogeneous population, 
and that bacterial protoplast fusion (Scaffner, 1980; 
Seed et al., 1987 might provide a good vehicle to deliver 
many copies of a single plasmid. With chloramphenicol 

20 amplification, as many as 3000 copies of a ColEl 

origin-containing plasmid can be expressed in E. coli 
(Clewell, 1972) . 

To test whether protoplast fusion would be 
suitable in our system, we first fused protoplasts 

25 containing amplified pSV2Tatl-72 or pSV2Tatl-48 plasmids 
into HeLa cells containing a stably integrated HIV-1 
LTR-TAR-GFP reporter. Approximately 10% of cells fused 
with pSV2Tatl-72 displayed high GFP fluorescence as 
monitored by microscopy and FACS analysis (Figure 3A) 

30 whereas fusion to pSV2Tatl-48 produced virtually no 
signal. Remarkably, diluting pSV2Tatl-72 protoplasts 
with increasing amounts of inactive pSV2Tatl-48 
protoplasts resulted in a proportional decrease in the 
number of GFP-expressing cells, but not a proportional 

35 decrease in fluorescence intensity (Figures 3 A and 3B) . 
This result suggests that, statistically, few protoplasts 
(perhaps as few one) delivered their contents into each 
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HeLa cell, even at relatively high (10-20%) fusion 
efficiencies. In contrast, the same ratio of plasmids 
delivered by lipofection resulted in a proportional 
decrease in fluorescence intensity and was too low to be 
5 detected with 1% pSV2Tatl-72, as expected if cells were 
randomly sampling the distribution of plasmids in the 
transfection mixture. Thus, it appears that protoplast 
fusion results in near-clonal delivery of plasmids into 
HeLa cells even though efficient fusion requires a large 

10 excess (>1000) of protoplasts to cells (Rassoulzadegan et 
al./ 1982) and many protoplasts bind to each cell as 
judged by light microscopy. Clonal delivery may arise 
because fusion is very inefficient or because delivery of 
so many plasmids saturates the cellular DNA entry system 

15 and preclude uptake of additional plasmids. 

b. Recovery of plasmids from FAGS- sorted 

cells 

Though in principle protoplasts deliver many 

20 plasmids into a cell, it was unclear whether a 

substantial fraction of plasmid DNA would remain intact 
after the 48 hours required to express GFP and sort 
cells, and whether the few remaining plasmids could be 
efficiently recovered. Previous e3q)ressiori cloning 

25 strategies have relied on plasmid replication in 

recipient cells, most often using an SV40 origin and 
T-antigen-expressing cells (such as COS) to amplify 
plasmids episomally (Seed, 1995) . Indeed, without 
amplification recovery was quite poor in our hands and 

30 required modification of the cell lysis procedure and 

preparation of highly electrocompetent cells (-1.5 x 10^^ 
colonies/mg pUC19 with DH-5a cells) to achieve reasonable 
efficiencies. To test plasmid recovery from the small 
number of positive cells expected from a library screen, 

35 we fused pSV2Tatl-72 protoplasts into HeLa GFP reporter 
cells and collected positive cells by FAGS sorting. Hirt 
supernatants (Hirt, 1967) prepared from -50 cells 
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typically yielded <0.1 colony per sorted cell. In . 
contrast, an alkaline- lysis phenol -extraction protocol in 
which untransfected HeLa cells, tRNA, and glucose were 
added to reduce plasmid loss yielded -1 colony per sorted 
5 cell (Figure 4) . Although more efficient recovery would 
help ensure that individual plasmids are not lost from 
the sorted population, it is possible to screen 
reasonably sized libraries (-10^ members) using the 
current procedure and sorting multiple representatives of 
10 each positive cell (see Discussion) . 

c. Screening for positive clones in a mock 

library 

To mimic the situation encountered in a 

15 library screen, we next tested the ability to recover a 
small number of active plasmids from a large pool of 
inactive plasmids. Protoplasts containing pSV2Tatl-72 
and pSV2Tatl-48 were mixed in a 1:10^ ratio and fused to 
HeLa GFP reporter cells. After 48 hours, 1.1 x 10^ cells 

20 were sorted and 893 GFP-positive cells were collected 
(Figure 5, Round 1) . Plasmids were extracted, 
electroporated into DH-5of cells, and 1140 colonies were 
obtained. Twenty colonies were analyzed by PGR and all 
contained pSV2Tatl-48. In the second round, protoplasts 

25 were prepared from a mixture of the 1140 colonies, 3 x 10^ 
cells were sorted, and 710 positive cells were collected 
(Figure 5, Round 2) . From these, 800 colonies were 
obtained, and of twenty analyzed by PGR, two contained 
pSV2Tatl-72 and the remainder contained pSV2Tatl-48. In 

30 the third round, 5x10^ cells were sorted, 3000 positive 
cells were collected, and 890 colonies were obtained (for 
technical reasons, cells were sorted after 96 hours and 
plasmid recovery was reduced) . Twelve of twenty colonies 
analyzed by PGR contained pSV2Tatl-72. Thus, positive 

35 clones were enriched from 1 in 105 to 60% after three 
rounds of screening. A similar level of enrichment was 
observed after three rounds of screening using protoplast 
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fusion and a replicating vector in COS cells (Seed et 
al., 1987). 

d. ' Identification of tight RRE-bindino 
peptides from a combinatorial library 
5 Previous studies have shown that arginine-rich 

peptides can bind to RNA sites with high affinities and 
specificities using a variety of conformations for 
recognition (Calnan et al., 1991; Tan et al., 1993; Chen 
et al., 1995; Harada et al., 1996). For example, a Rev 

10 peptide binds to the RRE in an a-helical conformati|on 

whereas a BIV Tat peptide binds to BIVTAR as a p -hairpin 
(Puglisi et al., 1995; Ye et al., 1995; Battiste et al., 
1996; Ye et al, 1996) . In each case, few amino acids 
other than arginine provide specific contacts to the RNA 

15 (see Figure 6A) , leading to the hypotheses: 1) that 
specific RNA-binding peptides could have evolved 
relatively easily beginning with polyarginine and 2) that 
it might be possible to identify novel RNA-binding 
peptides from combinatorial libraries restricted to 

20 relatively few types of amino acids (Harada et al., 1996; 
Harada et al., Frankel, 1997). To further explore these 
hypotheses, we designed a combinatorial library in which 
four residues within a stretch of fourteen arginines were 
randomized using twelve hydrophilic or charged amino 

25 acids (Figures 6A and 6B) . The randomized positions 

correspond to non-arginine residues in Rev (Figure 6B) , 
and the library contains 184 (-1x10^) codon sequences 
encoding 124 (-2.1 x 104) peptides. The library was 
fused to the Tat activation domain in the context of 

30 flanking alanines to help stabilize a-helical 

conformations (Tan et al., 1993; Tan et al . , 1994), and 
the library was screened for tight RRE binders using a 
HeLa cell -line containing a stably integrated HIV-1 
LTR-RRE IIB-GFP reporter . Protoplasts containing Tat-Revl4 

35 and Tat-Aixfl4 were used as positive and negative controls 
(Figure 6C) , and the sorting window was set to identify 
fusion proteins with higher activities than Tat-Revl4, 
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presumably reflecting tighter binding to the RRE IIB 
site. Three rounds of screening were performed; 7 x 10^ 
cells were sorted in the first round and 800 positive 
cells were collected, and increasing numbers of strong 
5 GFP expressors were observed in the two subsequent rounds 
(Figure 6C) . Plasmids from 51 individual clones were 
tested for activation of the RREIIB-GFP reporter and 36 
showed high level expression by fluorescence microscopy. 
Six unique sequences were found (Figure 7) , two 

10 containing glutamine at position 7 (clones 1 and 2) and 
four containing frame shifts near the C-terminus of the 
peptide that introduced a glutamine followed by 
additional basic residues (clones 4-6) . An additional 
screen was performed in which the sorting window was set 

15 slightly lower, and 15 additional RRE binders were found, 
12 containing at least one glutamine, predominantly at 
position 7 (Figure 7) . 

To more quantitatively assess RNA-binding 
activities and to compare the activities observed with 

20 the GFP reporter to a known reporter, we measured 

activities of the selected Tat fusions using an HIV-1 LTR 
RREIIB-CAT reporter which shows a tight correlation 
between in vivo activation and specific in vitro 
RNA-binding affinities (Tcui et al., 1993; Tan et al., 

25 1994; Symensma et al-, 1996). All Tat fusions that 

activated GFP expression to high- levels (clones 1-6) also 
activated CAT expression to high levels, and the best 
fusions were 5-10-fold more active than Tat-Revl4 (Figure 
8A) . Binding to the RRE IIB site is specific as judged 

30 by the inability of the fusions to activate through an 
RNA-binding mutant reporter in which G46:C74 was changed 
to C:G (Tan et al., 1993). 



e. Glutamine -mediated binding specificity 
35 Most of the selected RRE binders contained at 

least one glutamine residue, most often located at a 
position corresponding to Asn40 of Rev. This was 
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especially surprising given the recent report of a change 
of specificity Rev mutant in which ABn40->Gln allowed 
recognition of a mutant RRE I IB site containing an A73->G 
siibstitution but abolished binding to the wild-type site 
5 (Jain et al . , 1996). In the Rev peptide-RRE IIB complex, 
Asn40 hydrogen bonds to a non-Watson-Crick G47-A73 base 
pair {Battiste et al., 1996; Ye et al., 1996). Upon 
examining the sequences of the selected 
glutamine- containing peptides (Figure 7) , we observed 

10 that the other non-arginine residues were rather variable 
and therefore suspected that glutamine within a 
polyarginine context might be sufficient to mediate 
high-affinity RRE binding. We constructed a variant 
,R6QR7, in which glutamine was placed at the equivalent 

15 of position 40 in an otherwise all-arginine background 
and measured activation of the RREIIB-CAT reporter. 
Remarkably, this variant was substantially more active 
than the Rev peptide, and equally remarkable, a variant 
containing asparagine at the same position, R6NR7, was 

20 inactive (Figure 8B) . The opposite result was obtained 
in the context of the Rev sequence; the wild- type 
peptide, which contains asparagine, was active whereas 
the glutamine mutant was inactive, as reported by Jain et 
al, 1996) . Thus, the context in which asparagine or 

25 glutamine is presented to the RRE is critical. 

The arginine-rich RNA-binding motif, first 
identified in bacteriophage ant i terminator proteins, 
ribosomal proteins, and several retroviral proteins 
including HIV Tat and Rev (Lazinski et al., 1989), 

30 appears to provide an excellent framework for designing 
novel RNA-binding molecules. Here we have explored the 
hypothesis that specific binders can be readily evolved 
from a polyarginine peptide by screening a combinatorial 
library in which four positions were randomized within a 

35 sequence of fourteen arginines. We identified several . 
peptides that bind tightly to the RRE and found that a 
single glutamine within a polyarginine framework mediated 
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tight binding. 

In the. Rev peptide, which binds to the major 
groove of the RRE IIBsite in an a-helical conformation 
(Tan et al., 1993; Battiste et al . , 1996; Ye et al., 
5 1996) , Asn40 makes a specific contact to a G47-A73 base 
pair, with its amide group donating a hydrogen bond to 
the N7 group of G47 and accepting a hydrogen bond from 
the N6 group of A73 . We propose that the amide group of 
glutamine makes a similar contact to the G-A pair in the 

10 selected peptides. RemarkcQDly, no RRE-binding activity 
was observed when glutamine was replaced by asparagine in 
the polyarginine context, and conversely, no activity was 
observed when asparagine was replaced by glutamine in the 
Rev peptide context. The orientation and depth of 

15 penetration of the Rev a-helix in the major groove is 
determined by a set of contacts to functional groups on 
the bases and to backbone groups, with several contacts 
involving arginine side chains. Clearly, the Rev-RRE 
arrangement accommodates the coplanar orientation between 

20 Asn40 and the G-A pair, and we infer that the extra 

methylene group of glutamine cannot be accommodated. In 
contrast, in the polyarginine context we imagine that the 
peptide remains helical but is oriented less deeply in 
the major groove, perhaps because additional arginines 

25 cannot be accommodated at the tight RNA-peptide interface 
or because additional contacts are made, cuid glutamine 
has the appropriate length to contact the G-A pair. The 
observation that the glutamine -containing peptides bind 
more tightly than the Rev peptide suggests that the 

30 presumed new orientation may be more energetically 

favorable. Detailed structural information is clearly 
needed to establish the basis for improved binding. In 
DNA-protein interactions, coplanar amino acid-base 
arrangements are commonly seen in which arginines form 

35 two hydrogen bonds to the guanines of G:C base pairs and 
glutamine or asparagine form two hydrogen bonds to the 
adenines of A:T pairs, as originally proposed by Seeman 



wo 98/44147 



PCT/US98/05740 



46 

et al., 1976 . It appears that glutamine or asparagine 
side chains are also well-suited to form coplanar 
hydrogen-bonded interactions with G-A base pairs, which 
may be common to RNA structures (Wyatt et al., 1993). An 
5 Asn40->Gln mutation was identified in Rev as a change of 
specificity mutant that bound a mutant RRE (A73->G)with 
high affinity but did not bind the wild-type RRE (Jain et 
al., 1996), consistent with our results. Since the G-A 
pair is mutated, it seems reasonable that glutamine docks 

10 differently to this mutant RRE site. 

Arginine-rich peptides with different 
sequences and conformations than Rev have been identified 
that bind to specifically RRE IIB and these have been 
evolved into tighter binders than Rev. RNA aptamers have 

15 been identified by in vitro selection that bind Rev with 
10-fold higher affinities than IIB (Symensma et al., 
1996) - Apparently the affinity of the Rev peptide-RRE 
IIB interaction has not been optimized during viral 
evolution. Since Rev is essential for HIV replication 

20 and Rev binding to the RRE is essential for function, 
tight RRE-binding peptides might be used to block the 
interaction and thereby inhibit viral replication. 
Preliminary experiments suggest that Rev function can be 
inhibited by the peptides described here, possibly 

25 providing new leads for drug discoveiy. The mammalian 

screening system may be viewed as a tool to help identify 
interesting and potentially useful RNA-binding molecules. 
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Although the foregoing invention has been 
described in detail for purposes of clarity of 
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vinderstanding, it will be obvious that certain 
modifications may be practiced within the scope of the 
appended claims. All piiblications and patent documents 
cited above are hereby incorporated by reference in their 
5 entirety for all purposes to the same extent as if each 
were so individually denoted. 
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What is claimed is: 

1. A method of screening a library of nucleic 
acid fragments in eucaryotic cells, comprising: 

(1) trcinsforming the library of nucleic acid 
5 fragments into primary cells, wherein the cells are 

procaryotic or fungi; 

(2) culturing the primary cells under 
conditions whereby the copy number of nucleic acid 
fragments is amplified to an average of at least 200 

10 copies per transformed celia- 
cs) contacting the transformed primary cells 
with a population of eucaryotic cells under conditions 
whereby outersurfaces of the transformed primary cells 
and eucaryotic cells fuse and contents of the transformed 

15 primary cells including at least some of the library of 
nucleic acid fragments are transferred to the eucaryotic 
cells; 

(4) screening the nucleic acid fragments in 
the eucaryotic cells to isolate one or more eucaryotic 

20 cells having a desired property conferred by one or more 
members of the library of nucleic acid fragments, an 
expression product thereof, or a secondary metabolite of 
an expression product; 

(5) lysing the one or more eucaryotic cells 
25 to release the one or more members of the library of 

nucleic acid fragments and electroporating the nucleic 
acid fragments into further procaryotic cells; 

(5) propagating the further procaryotic cells 
to amplify the one or more members of the library of 
30 nucleic acid fragments, which confer the desired 
property* 

2. The method of claim 1, wherein the primary 
cells are E. coli, the nucleic acid fragments are 

35 contained in a ColEl vector, and the primary cells are 
cultured in the presence of an antibiotic to amplify the 
copy number of the library of nucleic acid fragments. 



I 
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3. The method of claim 1, wherein the 
eucaryotic cells lack capacity for episomal replication 
of the transferred nucleic acid fragments. 

5 4. The method of claim 1, further comprising 

isolating a nucleic acid fragment from the amplified 
further procaryotic cells, which confers the desired 
property. 

10 5. The method of claim 1, further comprising 

repeating steps (2) -(5), wherein the amplified further 
procaryotic cells in step (5) of a previous cycle form 
the transformed primary cells in step (2) of the next 
cycle . 

15 

6. The method of claim 1, wherein the library 
of nucleic acid fragments has at least 10^ members. 

7. The method of claim 1, wherein at least 
20 ten different members of the library of nucleic acid 

fragments are electroporated from the one or more 
eucaryotic cells to the further procaryotic cells. 



8. The method of claim 1, wherein the library 
25 of nucleic acid fragments are expressed in the eucaryotic 
cells before the screening. 



9. The method of claim 1, wherein the nucleic 
acid fragments encode different peptides, and one or more 
30 of the peptides confers the desired property in the 
eucaryotic cells - 



10. The method of claim 1, wherein the 
nucleic acid fragments encode enzymes, which produce 
35 secondary metabolites in the procaryotic cells, which are 
trcuisferred together with at least some of the nucleic 
acid fragments to the eucaryotic cells, and one or more 
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of the secondary metabolites confers the desired property 
in the eucaryotic cells. 

11. The method of claim 9, wherein, in the 
5 screening step, the eucaryotic cells contain a construct 
encoding a reporter enzyme operably linked to a 
regulatory sequence, and the one or more peptides confers 
the desired property by binding to the regulatory 
sequence or a tremscript thereof inducing expression of 
10 the reporter enzyme. 



12. The method of claim 11, wherein the 
transformed primary cells comprise the construct which is 
transferred into the eucaryotic cells with the library of 

15 nucleic acid fragments. 

13. The method of claim 11, wherein the 
reporter is GFP. 



20 14. The method of claim 12, wherein a 

transcript of the regulatory sequence is recognized by a 
site- specif ic RNA binding peptide euid the nucleic acid 
fragments encode the RNA binding peptide. 

25 15. The method of claim 15, wherein the 

nucleic acid fragments encode fusion proteins conqprising 
an RNA binding peptide and a transcriptional inducer, the 
RNA binding peptide differing among fusion proteins. 

30 16. The method of claim 16 wherein the 

construct encodes GFP operably linked a promoter induced 
by HIV-1 TAT and an RNA binding site and the 
transcriptional inducer is a HIV-1 TAT polypeptide. 



35 



17. The method of claim 4, further comprising 
determining the sequence of the isolated member of the 
nucleic acid fragment library. 
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18. The method of claim 17, further 
comprising producing the expression product of the 
isolated member of the nucleic acid fragment library, 

5 19. The method of claim 18, further 

comprising formulating the expression product as a 
therapeutic composition. 

20. The method of claim 1, wherein the 

10 nucleic acid fragments are members of a natural library. 

21. The method of claim 1, wherein the 
nucleic acid fragments are members of a randomized 
library. 

15 

22. A method of screening a library of 
nucleic acid fragments in eucaryotic cells, comprising: 

(1) transforming the library of nucleic acid 
fragments into primary cells, wherein the cells are 

20 procaryotic or fungi; 

(2) contacting the transformed primary cells 
with a population of eucaryotic cells under conditions 
whereby outersurfaces of the transformed primary cells 
and eucaryotic cells fuse and contents of the transformed 

25 primary cells including at least some of the library of 
nucleic acid fragments are transferred to the eucaryotic 
cells; 

(3) screening the nucleic acid fragments in 
the eucaryotic cells without substantial replication of 

30 the nucleic acid fragments in the eucaryotic cells to 
isolate one or more eucaryotic cells having a desired 
property conferred by one or more members of the library 
of nucleic acid fragments, an expression product thereof, 
or a secondary metabolite of an expression product; 

35 (4) transferring the one or more members of 

the library of nucleic acid fragments into further 
procaryotic cells; 
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(6) propagating the further procaryotic cells 
to amplify the one or more members of the library of 
nucleic acid fragments, which confer the desired 
property . 

5 

23. A method of screening a library of 
nucleic acid fragments in eucaryotic cells, comprising: 

(1) transforming the library of nucleic acid 
fragments into cells, wherein the cells are procaryotic 

10 cells or fungi, emd culturing the cells under conditions 
in which expression products and/or secondary metcibolites 
of expression products are produced; 

(2) contacting the transformed cells with a 
population of eucaryotic cells under conditions whereby 

15 out ersur faces of the transformed procaryotic and 

eucaryotic cells fuse and contents of the transformed 
procaryotic cells including at least some of the library 
of nucleic acid fragments, expression products thereof 
and/or secondary metabolites of expression products are 

20 transferred to the eucaryotic cells, whereby at least 
some eucaryotic cells receive an egression product 
and/or a secondary metabolite thereof cuid a nucleic acid 
fragment encoding the expression product; 

(3) screening the eucaryotic cells to isolate 
25 one or more eucaryotic cells having a desired property 

conferred by one or more of the expression products or 
one or more of the secondary metabolites produced in the 
primary cells; 

(4) transforming the one or more member of 
30 the library of nucleic acid fragments from the one or 

more eucaryotic cells into further procaryotic cells; 

(5) propagating the transformed further 
procaryotic cells to amplify the one or more members of 
the library of nucleic acid fragments, which produce the 

35 one or more expression products and/or one or more 

secondary metabolites that confer the desired property. 
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24 . A method of screening for RNA binding 
peptides in a eucaryotic cells, cotnprising: 

introducing into a population of eucaryotic 
cells a library of nucleic acid fragments encoding fusion 
5 proteins; a fusion protein comprising a peptide linked to 
a transcriptional inducer, the peptides varying between 
fusion proteins; 

wherein the eucaryotic cells further comprise 
a construct encoding a reporter gene operably linked to a 
10 promoter from which expression is stimulated by the 
transcriptional inducer and an RNA binding site; 

whereby one or more fusion proteins, each 
comprising a peptide having specific affinity for the RNA 
binding site bind to the RNA binding site of the reporter 
15 construct or a transcript thereof via the peptide, and 
the transcriptional inducer linked to the peptide 
stimulates expression of the reporter gene from the 
promoter; 

isolating one or more eucaryotic cells with 
20 stimulated expression of the reporter gene, the one or 
more cells containing one or more nucleic acid fragments 
encoding the one or more fusion proteins comprising a 
peptide having specific affinity for the RNA binding 
site. 

25 

25. The method of claim 24, wherein the 
introducing step comprises: 

(1) transforming the library of nucleic acid 
fragments into primary cells, which are procaryotic cells 

30 or fungi; 

(2) contaciting the transformed primary cells 
with the population of eucaryotic cells under conditions 
whereby outersurf aces of the transformed primary and 
eucaryotic cells fuse and contents of the transformed 

35 primary cells including at least some of the library of 
nucleic acid fragments are transferred to the eucaryotic 
cells; 
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and the method further comprises 
(4) transforming nucleic acid fragments from 
the isolated one or more eucaryotic cells into further 
procaryotic cells; 
5 (5) propagating the transformed further 

procaryotic cells to amplify one or more members of the 
library of nucleic acid fragments encoding fusion 
proteins comprising a peptide with specific affinity for 
the RNA binding site. 

10 

26. The method of claim 25, wherein the 
primary cells are E. coli, the nucleic acid fragments are 
contained in a ColEl vector, and the primary cells are 
cultured in the presence of an antibiotic to amplify the 

15 copy number of the library of nucleic acid fragments. 

27. The method of claim 26, wherein the 
eucaryotic cells lack capacity for episomal replication 
of the transferred nucleic acid fragments. 

20 

28. The method of claim 25, wherein the 
transcriptional inducer is a HIV TAT polypeptide and the 
promoter is a HIV LTR promoter. 

25 29. The method of claim 28, wherein the RNA 

binding site is a HIV RRE site. 

30. The method of claim 29, wherein the HIV 
TAT polypeptide lacks a natural HIV Tat RNA binding 

30 domain. 

31. The method of claim 24, wherein the 
construct is transferred to the eucaryotic cells as a 
conponent of the contents of the primary cells. 
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32. The method of claim 24, further 
comprising formulating synthesizing the peptide having 
specific affinity for the RNA binding site. 

5 33. The method of claim 24, further 

comprising formulating the peptide having specific 
affinity for the RNA binding site in a therapeutic or 
diagnostic composition. 

34. A method of screening RNA sequences for 
specific affinity to a selected peptide in eucaryotic 
cells, comprising: 

introducing into a population of eucaryotic 
cells a library of variant forms of a first construct, 
the first construct encoding a reporter gene operably 
linked to a promoter from which expression is stimulated 
by a transcriptional inducer and a potential RNA binding 
site, which varies between the variant forms; 

wherein the eucaryotic cells further comprise 
a second construct encoding a fusion protein comprising 
a transcriptional inducer linked to the selected peptide; 

whereby the fusion protein binds to one or 
more variant forms of the first construct or transcripts 
25 thereof, each having a potential RNA binding site with 

specific affinity for the peptide, stimulating expression 
of the reporter gene from the promoter; 

isolating one or more eucaryotic cells with 
stimulated expression of the reporter gene, the one or 
30 more cells containing one or more variant forms of the 
first construct encoding one or more potential RNA 
binding sites with specific affinity for the selected 
peptide. 
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